## Figure counter https://stackoverflow.com/questions/13848137/figure-captions-references-using-knitr-and-markdown-to-html
fig <- local({
    i <- 0
    ref <- list()
    list(
        cap=function(refName, text) {
            i <<- i + 1
            ref[[refName]] <<- i
            paste("Figure ", i, ": ", text, sep="")
        },
        ref=function(refName) {
            ref[[refName]]
        })
})

Introduction

Traditional approaches to analyzing brain anatomy focus on attempting to localize the brain regions associated with a given covariate. Particularly in the study of disease or in drug development it is useful to be able to state that a given disease or drug has an effect on brain anatomy at a specific location. In this document we’re going to evaluate simple hierarchical models and a novel model termed the effect diffusion tree at identifying where in the brain effects are occurring. All code for this analysis can be viewed either by opening individual code sections by clicking the code button to the right of the screen, or opened globally with the code menu in the top right corner using the Show all code button.

Brain mapping

To facilitate analysis and discussion of location within the brain, anatomy is often summarized down to a collection of structures via an atlas, defined by expert anatomists. An atlas partitions the brain into non-overlapping structures, much the way a map might partition a country into states or provinces. This has benefits in reducing measurement noise by aggregating observations within a region, and by providing a vocabulary for describing where in the brain an effect occurs (Evans et al. 2012).

Once structures have been defined, many different properties can be measured. Examples range from the amount of gene expression to the geometric properties of the structure depending on the research question. In this analysis, we’re going to focus on the volume of each structure. In our lab volumetric data is automatically derived from magnetic resonance images (MRI) using an automatic labelling technique called MAGeT (Chakravarty et al. 2013).

Regions are defined by anatomists typically by visible boundaries and shared development. Our lab predominantly works with mouse MRI and over the years has collaboratively produced one of the most detailed high resolution MRI atlases of the mouse brain [Dorr et al. (2008); Ullmann et al. (2013) ; Richards et al. (2011); with additional modification by Qiu and Egan].


Figure 1: The Dorr-Steadman-Ullman-Richards-Egan-Qiu (DSURQE) atlas illustrated on an average mouse brain MRI. Colours indicate distinct structures, the DSURQE atlas identifies 336 structures.


We use this atlas for most of our analyses, this involves summarizing our measures within region, and reporting regions where there is evidence for an effect of interest. It has been increasingly unsatisfying for us to be stuck with these exact structures, and so we’ve begun to consider our structures as part of a brain taxonomy tree.

Brain Taxonomy Trees

From the perspective of shared development, it is natural to consider the structures in an atlas as the leaves of a taxonomy tree. Parent nodes in the tree indicate collections of structures with shared development. These higher order structures are also useful for defining the location of effects. Conveniently for us, a large brain imaging group – the Allen Brain Institute (ABI) – has organized their mouse brain histological atlas in just such a way (Lein et al. 2007).


Figure 2: A screenshot of the Allen Mouse Brain Atlas indicating distinct structures as identified by histology. Note the hierarchical organization present on the left.


The Allen brain taxonomy tree defines structure relations as a recursive subdivison of the brain. The root node of the tree is the whole brain, the first division is by tissue type, dividing the brain into grey and white matter. The grey matter is subdivided into cerebrum and cerebellum, and this subdivision process continues until the the structures cannot be further separated into discernible parts. These terminal (leaf) structures are roughly equivalent to the structures in our MRI atlas. By unifing our nomenclature to match the Allen structures, we can combine our MRI atlas with their taxonomy tree. This allows us to divide the mouse brain into a hierarchy of structures. This gives us a vocabularly for higher-order structures and additional information about structure relatedness.

The problem

Historically, each region has been treated as essentially independent. Individuals have their brains characterized with MRI, the brains are then aligned to an atlas and the volumes for each region of each subject’s brain are measured. The volumes at each structure are then analyzed with separate linear models.

There are some problems with this approach:

  1. Regions are not independent, proximity in space or similarity by development cause regions to receive similar genetic and environmental effects.
  2. Regions are not homogeneous, the structure volume effects variance, with small structures having high noise
  3. Partitioning effort isn’t uniform across the brain. “Interesting regions” get subdivided into smaller regions.
  4. Regions may be too small for reliable measurement.

We aim to improve on this situation. With our hierarchy tree in hand we can:

Bayesian models have been used in neuroimaging for decades, although the typical focus of these models is either to replace the per structure linear model with a bayesian model (Friston, Glaser, et al. 2002; Friston, Penny, et al. 2002; Woolrich et al. 2009) or to induce spatial dependence across the brain (Sidén et al. 2017; Penny, Trujillo-Barreto, and Friston 2005). The closest approach to ours is that of Bowman et al (2008), where they perform a multi-step modelling approach with per voxel (3D pixel) linear modelling followed by hierarchical modelling of the coefficients within and between structures. No other approach, to our knowledge, has attempted to model additional taxonomic organization of the brain.

The Models

In order to accomplish our goals, we’re going to try three different bayesian models that will encourage pooling between structures.

  1. A standard hierarchical model fit with rstanarm’s stan_lmer function. With pooling between all structures, regardless of position in the brain taxonomy.
  2. A hierarchical model fit with rstanarm that incorporates taxonomy information, by pooling effects across parent structures.
  3. A tree based model where effects diffuse down the tree as a gaussian random walk.

This third model we’ve termed the effect diffusion tree. In this model, the covariate effect at each child node is the effect at their parent plus some gaussian noise. Encouraging effects to pool by ancestry. It is a small change to the model used in stan_lmer. The distinction will be explained below.

An example model we’d like to fit (using either stan_lmer or the effect diffusion tree) is something of the form:

y ~ grp + (grp | structure) + (1 | individual)

Where y is our volumes, grp is a grouping covariate of interest, structure is the brain structure, and individual is the mouse. We’re claiming that the effect of group depends on the structure of interest with pooling toward the mean, and a random intercept for each mouse mostly to account for brain size.

For simplicity we will assume there are two groups.

This model amounts to

\[ y_{is} = \boldsymbol{\beta} \mathbf{x}_i^t + \boldsymbol{\beta}_{s} \mathbf{x}_i^t + r_i + \epsilon_{is}\]

with \(i\) indexing individuals, \(x\) indexing groups, and \(s\) indexing structures. Here \(x\) is a two component vector, one for the intercept, and one for the group. Both \(\boldsymbol{\beta}\) and \(\boldsymbol{\beta}_s\) are two component vectors, with intercept and group coefficients, and \(r_i\) is a scalar with the per individual intercept. All \(\epsilon\)’s used in this document represent independent gaussian noise.

The group specific parameters \(\beta_s\) are parameterized by a non-centered multivariate normal distribution.

\[ \boldsymbol{\beta}_s = \Sigma^{1/2} \boldsymbol{\epsilon}_{s} \]

Where \(\Sigma\) is the 2x2 covariance matrix for the two parameters and \(\Sigma^{1/2}\) is its cholesky factor.

The effect diffusion model modifies this slightly by centering group specific parameters on the value of their parent.

\[ \boldsymbol{\beta}_s = \boldsymbol{\beta}_{p_s} + \Sigma^{1/2} \boldsymbol{\epsilon}_{s} \]

Where \(p_s\) is the index of the parent node.

Both models use an lkj prior for \(\Sigma\). For the effect diffusion model this is the covariance parameter for the change in effect associated with taking a step from a parent to a child node. Originally the model had been formulated such that each parent node had their own covariance parameter, unfortunately, these models couldn’t be fit due to divergences, so only one global covariance is estimated.

The data

First we will consider the data. Regional brain volumes from traditional MRI analyses were gathered by members of our lab. For this analysis we will use a subset of the mice from our open data release of ~700 mouse brain images (1000+ more coming soon), hosted by the Ontario Brain Institute (https://www.braincode.ca/content/open-data-releases).

The data release includes volumes for each leaf structure arranged as as square matrix of subject by structures. This matrix can be converted into a long-format data-frame with columns for subject, structure, and volume. This organization allows easy modelling with R’s formula interface.

The long format data will be processed with the models presented above. But first we’ll compare the performance of the models with some simulations.

Simulations

To check that the effect diffusion tree induces a reasonable prior, we’re going to simulate some data and try to recover our simulate parameters with the effect diffusion tree and the two stan_lmer models..

Init

First we’ll read in some R packages. Most of the utility functions used are in the package hierarchyTrees, distributed with this document. Included are functions for manipulating trees using the data.tree package, producing simulated data for trees, and generating the required data to run the stan_lmer and diffusion tree models. This code also makes heavy use of dplyr and purrr.

knitr::opts_chunk$set(cache.lazy = FALSE)

## If necessary devtools::install_github("cfhammill/hierarchyTrees")

options(mc.cores = 4)
suppressPackageStartupMessages({
  library(hierarchyTrees)
  library(bayesplot)
  library(rlang)
  library(loo)
})

set.seed(20180406)

Read In

Now we need to get some data. This is a small subset of 20 male and 20 female control mice. We’ll be simulating the effect of sex on the brain structure taxonomy.

metadata <- readRDS("metadata.rds")

defsFile <- "DSURQE_40micron_R_mapping.csv"
vols_hierarchical <- readRDS("vis_areas.rds")

leaves <- vols_hierarchical$Get("name", filterFun = isLeaf)
parents <- vols_hierarchical$Get("name", filterFun = function(n) !isLeaf(n))

indivs <- metadata$ID
indiv_effects <- map_dbl(indivs, function(i) rnorm(1, 0, .05))  

Inspect the tree

Simulation will be done on a small subset of the whole tree, in particular the visual areas.

fix_names_and_plot(vols_hierarchical)

Figure 3: An interactive graph diagram highlighting the subset of the anatomical tree used for simulations

vols_hierarchical
##                                  levelName
## 1  Visual areas                           
## 2   ¦--Primary visual area                
## 3   ¦   ¦--left Primary visual area       
## 4   ¦   °--right Primary visual area      
## 5   ¦--Lateral visual area                
## 6   ¦   ¦--left Lateral visual area       
## 7   ¦   °--right Lateral visual area      
## 8   ¦--posteromedial visual area          
## 9   ¦   ¦--left posteromedial visual area 
## 10  ¦   °--right posteromedial visual area
## 11  ¦--Anterolateral visual area          
## 12  ¦   ¦--left Anterolateral visual area 
## 13  ¦   °--right Anterolateral visual area
## 14  °--Anteromedial visual area           
## 15      ¦--left Anteromedial visual area  
## 16      °--right Anteromedial visual area

Simulation function

We’re going to define a utility function to create trees given the following characterestics

  1. noise_sd: What sd should the simulated noise have.
  2. leaf_effects: A vector of effects to add at the leaf nodes.
  3. indiv_effects: A vector of random intercept per individual.
  4. meta: A vector with a single covariate.
  5. base_tree: The tree defining the taxonomy structure.
  6. extra_effects: Additional effects to add at non-leaf nodes. Useful to induce changes that follow the taxonomy.
local_create_tree <- function(noise_sd, leaf_effects
                            , leaf_vols = leaf_volumes
                            , indiv_effs = indiv_effects, meta = metadata$SEX
                            , base_tree = vols_hierarchical
                            , extra_effects = NA){
  leaf_vols <- leaf_effects * 0 ## This argument is ignored now
  create_tree(noise_sd, leaf_effects, leaf_vols, indiv_effs, metadata = meta
            , base_tree = base_tree, extra_effects = extra_effects)
}

Setup Experiment Grid

Now we’re going to generate an experiment grid to examine the performance of the two models. First we’re going to randomly generate leaf effects with mean = 0, sd = 0.1. We’ll test all combinations of:

  1. Signal to noise ratio {0.5,1,2}. The ratio of the mean effect to the sd of the per observation noise.
  2. Hierarchical effects {yes,no}. Whether or not to add extra effects, if so, add an extra effect equal to the mean leaf effect at nodes in the lateral visual area. Subtract the mean leaf effect from every node in the primary visual area.

This will produce 6 trees to experiment on. Which we will fit with both models.

exp_grid <-
  crossing(snr = c(.5,1,2)
         , hier_eff = c(TRUE, FALSE)) %>%
  mutate(leaf_effects =
           map(snr, ~ rnorm(length(leaves), 0, .1) %>% setNames(leaves))
       , mean_eff = map_dbl(leaf_effects, ~ mean(abs(.)))
       , noise = map2_dbl(mean_eff, snr, ~ .x/.y)
       , extra_effects = 
           map2(hier_eff, mean_eff
             , ~ `if`(.x
                    , c("Lateral visual area" = .y
                      , "Primary visual area" = -.y)
                    , NA)))

Now we’ll take the experiment grid and create the trees, then extract from the trees the data needed to fit the model.

exp_grid <-
  exp_grid %>%
  mutate(
    ## Triple bang turns a list into arguments
    ## This will create two list columns, effects - with the composite effects
    ## and tree - with the simulated tree
    !!! transpose(pmap(., function(leaf_effects, noise, extra_effects, ...){
      local_create_tree(noise, leaf_effects, extra_effects = extra_effects)
    }))
  ) %>%
  mutate(
    ## This adds hvft - the data frame for stan_lmer and
    ## hept - the data for the diffusion tree
    !!! transpose(map(.$tree, function(tr) tree_to_ept_data(tr, metadata, justLeaves = TRUE)))
  )

Compile the tree model

Let’s compile the tree model using a convenience function from hierarchyTrees.

## To view the model run:
## file.show(
##   system.file("models/effect_diffusion_tree.stan"
##             , package = "hierarchyTrees"))

edt_sm <- compile_models("effect_diffusion_tree")

Run the models

Now that the necessary data has been generated we can fit the models for the whole grid. This takes about an hour to run on a four core computer, so the results were saved and this code won’t be run when rendering this document.

Three models were fit

  1. A “flat” stan_lmer model, y ~ sex + (sex | structure) + (1 | ID), with pooling across all structures, in code as slmer
  2. A hierarchical stan_lmer
model_fits <-
  exp_grid %>%
  mutate(edt =
           map(hept 
             , function(d) sampling(edt_sm
                                  , control = list(max_treedepth = 15
                                                 , adapt_delta = .99)
                                  , data = d
                                  , open_progress = FALSE))
       , slmer =
           map(hvft
             , function(d) stan_lmer(scaled_vol ~ SEX + (SEX | name) + (1 | ID)
                                   , data = d
                                   , open_progress = FALSE))
       , hslmer =
           map(hvft
             , function(d) stan_lmer(scaled_vol ~ SEX + (SEX | name) + (SEX  | parent) + (1 | ID)
                                   , data =
                                       d %>%
                                       mutate(parent = case_when(is.na(parent) ~ "root"
                                                               , TRUE ~ parent))
                                   , open_progress = FALSE
                                     , control = list(adapt_delta = .99)))
         )

saveRDS(model_fits, "simulation_results.rds")

There were divergences in 3 of the hslmer runs, in two cases only a single divergence but in one case there were 29. In the event that the hslmer results are better than the other two models these divergences will be investigated in more detail.

Summarizing performance

To see how the two models perform we’re going to look at performance on a few measures.

  1. Expected log predictive density (ELPD) from Pareto Smoothed Importance Sampling Leave-One-Out Cross-Validation (PSIS-LOOCV) using the loo package.
  2. Log-likelihood of the volume data given the models.
  3. Mean squared error of the predicted volume, \(\frac{1}{N}\sum_{i=0}^N(y_i - \hat{y_i})^2\)
  4. Fisher-transformed correlation between the estimated parameters and the simulated parameters at the leaves, \(atanh(cor(\boldsym{\beta}, \boldsym{\hat{\beta}}))\) where \(\boldsym{\hat{\beta}}\) is the vector of posterior median effect at the leaves.
  5. Approximate log-likelihood of the simulated effects given the model. This is computed as \(\sum_s log \left( \mathcal{N}(\beta_s | \mathbb{E}(\hat{\beta_s}), sd(\hat{\beta_s})) \right)\) where \(s\) indexes structures, mean and sd are computed from the posterior samples.

All measures except ELPD are computed within sample. The ELPD measures the generalization of the models. The log-likelihood of the volumes, and mean squared error are somewhat redundant alternatives for measuring model fit. The effect correlation and log-likelihood of the simulated effects measure the ability to recover the known input parameters.

Additionally, mcmc_area figures were generated for examining the posterior distribution of the effects. Warnings are suppressed in the following code block. The suppressed warnings are from ggplot2 indicating the axis label gets over-written.

model_results <- readRDS("simulation_results.rds")

model_results <-
  model_fits %>%
  ## Note pmaps in this mutate can be here because they don't depend on
  ## preceding results within the mutation 
  mutate(
    # Compute the PSIS-LOOCV
    lmer_loo = map(slmer, ~ {
      log_lik(.) %>%
        loo(.
          , r_eff =
              relative_eff(exp(.) ## dup . to first arg and exp
                         , chain_id = rep(1:4, each = 1000)))
    })

  , hlmer_loo = map(hslmer, ~ {
      log_lik(.) %>%
        loo(.
          , r_eff =
              relative_eff(exp(.) ## dup . to first arg and exp
                         , chain_id = rep(1:4, each = 1000)))
  })
    
  , edt_loo = map2(edt, hept, ~ {
      logLik_ept(.x, .y$y) %>%
        loo(.
          , r_eff =
              relative_eff(exp(.) ## dup . to first arg and exp
                         , chain_id = rep(1:4, each = 1000)))
    })
    
  # Extract the LOO expected log predictive density
  , lmer_elpd = map_dbl(lmer_loo, ~ .$estimates["elpd_loo", "Estimate"])
  , hlmer_elpd = map_dbl(hlmer_loo, ~ .$estimates["elpd_loo", "Estimate"])
  , edt_elpd = map_dbl(edt_loo, ~ .$estimates["elpd_loo", "Estimate"])
    
    # Extract model summary data 
  , lmer_res =
      pmap(., function(slmer, tree, sds, ...) 
        get_sglm_results(slmer, tree, sds, justLeaves = TRUE))
  , hlmer_res =
      pmap(., function(hslmer, tree, sds, ...) 
        get_hsglm_results(hslmer, tree, sds))
  , edt_res =
      pmap(., function(edt, tree, sds, ...)
        get_ept_results(edt, tree, sds, justLeaves = TRUE))
    
    # Compute fisher transformed correlation at the leaves
  , lmer_cor = 
      map2_dbl(lmer_res, effects, ~ atanh(cor(.x$effects[leaves], .y[leaves])))
  , hlmer_cor = 
      map2_dbl(hlmer_res, effects, ~ atanh(cor(.x$effects[leaves], .y[leaves])))
  , edt_cor =
      map2_dbl(edt_res, effects, ~ atanh(cor(.x$effects[leaves], .y[leaves])))
    
    # Get log-likelihood of the volumes given the models
  , lmer_logLik = map_dbl(slmer, ~ median(log_lik(.)))
  , hlmer_logLik = map_dbl(hslmer, ~ median(log_lik(.)))
  , edt_logLik = map2_dbl(edt, hept, ~ median(logLik_ept(.x, .y$y)))
    
    # Compute the mean square error
  , lmer_mse = map2_dbl(slmer, hept, ~ mean((.y$y - fitted(.x))^2))
  , hlmer_mse = map2_dbl(hslmer, hept, ~ mean((.y$y - fitted(.x))^2))
  , edt_mse = map2_dbl(edt, hept, ~ mean((.y$y - fitted_ept(.x))^2))
  ) %>%
  ## Note pmaps in this block require data from the preceding mutations
  mutate(
    ## Generate ggplot objects with the effect posteriors
    lmer_areas = 
      pmap(., function(lmer_res, effects, sds, tree, ...)
        effect_areas(lmer_res, effects, sds, tree, justLeaves = TRUE))
  , hlmer_areas = 
      pmap(., function(hlmer_res, effects, sds, tree, ...)
        effect_areas(hlmer_res, effects, sds, tree, justLeaves = TRUE))
  , edt_areas =
      pmap(., function(edt_res, effects, sds, tree, ...)
        effect_areas(edt_res, effects, sds, tree, justLeaves = TRUE))       
    ## Compute the pointwise effect log-likelihoods
  , lmer_pwllb = 
      pmap(., function(lmer_res, effects, sds, tree, ...)
        pw_effect_loglik(lmer_res, effects, sds, tree, justLeaves = TRUE))
  , hlmer_pwllb = 
      pmap(., function(hlmer_res, effects, sds, tree, ...)
        pw_effect_loglik(hlmer_res, effects, sds, tree, justLeaves = TRUE))
  , edt_pwllb =
      pmap(., function(edt_res, effects, sds, tree, ...)
        pw_effect_loglik(edt_res, effects, sds, tree, justLeaves = TRUE))
    
    ## Sum the pointwise log-likelihood
  , lmer_llb = map_dbl(lmer_pwllb, sum)
  , hlmer_llb = map_dbl(hlmer_pwllb, sum)
  , edt_llb = map_dbl(edt_pwllb, sum)
  )

Simulation Results

Now we’ll transform the results into something a little more readable:

model_results %>%
  select_if(~ !is.list(.)) %>%
  gather(measure, score, lmer_elpd:edt_llb) %>%
  separate(measure, into = c("model", "measure")) %>%
  spread(measure, score)
## # A tibble: 18 x 10
##      snr hier_eff mean_eff  noise model   cor  elpd    llb logLik   mse
##    <dbl> <lgl>       <dbl>  <dbl> <chr> <dbl> <dbl>  <dbl>  <dbl> <dbl>
##  1 0.500 F          0.0699 0.140  edt    1.22  -549 -5.34  -1.08  0.775
##  2 0.500 F          0.0699 0.140  hlmer  1.25  -549 -4.26  -1.08  0.775
##  3 0.500 F          0.0699 0.140  lmer   1.25  -548 -4.23  -1.07  0.775
##  4 0.500 T          0.0824 0.165  edt    1.83  -546 -0.523 -1.03  0.748
##  5 0.500 T          0.0824 0.165  hlmer  1.65  -548  1.57  -1.03  0.777
##  6 0.500 T          0.0824 0.165  lmer   1.74  -547  1.54  -1.03  0.759
##  7 1.00  F          0.0820 0.0820 edt    2.39  -491 -4.10  -0.895 0.540
##  8 1.00  F          0.0820 0.0820 hlmer  2.22  -493 -0.836 -0.897 0.551
##  9 1.00  F          0.0820 0.0820 lmer   2.29  -492  0.174 -0.895 0.541
## 10 1.00  T          0.0842 0.0842 edt    2.42  -493 -5.10  -0.918 0.554
## 11 1.00  T          0.0842 0.0842 hlmer  1.69  -495 -1.16  -0.918 0.556
## 12 1.00  T          0.0842 0.0842 lmer   1.69  -494 -1.08  -0.917 0.554
## 13 2.00  F          0.121  0.0603 edt    2.86  -408 -5.28  -0.686 0.348
## 14 2.00  F          0.121  0.0603 hlmer  1.75  -409 -0.578 -0.686 0.349
## 15 2.00  F          0.121  0.0603 lmer   1.75  -408 -0.490 -0.685 0.349
## 16 2.00  T          0.0831 0.0416 edt    2.77  -359 -6.01  -0.535 0.269
## 17 2.00  T          0.0831 0.0416 hlmer  2.00  -359 -1.13  -0.535 0.269
## 18 2.00  T          0.0831 0.0416 lmer   2.00  -358 -0.950 -0.535 0.269

So the results are pretty interesting, it seems as though the diffusion tree model and both stan_lmer models perform equivalently on elpd and mean squared error. The effect diffusion tree performs marginally better on the correlation between the simulated parameters and the median estimated parameters, but considerably worse on the approximate log-likelihood of the simulated parameters, suggesting the posteriors for the effect diffusion tree are much wider.

Let’s examine the posteriors for the estimated effects.

model_results$lmer_areas[[4]] + ggtitle("Effect posterior (stan_lmer)")
Figure 4: Posterior distributions for effect of sex at each node in the flat `stan_lmer` model. Pale bands show 50% credible intervals, dark blue lines show posterior medians, red line indicates the true effects.

Figure 4: Posterior distributions for effect of sex at each node in the flat stan_lmer model. Pale bands show 50% credible intervals, dark blue lines show posterior medians, red line indicates the true effects.

model_results$hlmer_areas[[4]] + ggtitle("Effect posterior (stan_lmer + parents)")
Figure 5: Posterior distributions for the effect of sex at each node in the `stan_lmer` model that includes a parent effect. Pale bands show 50% credible intervals, dark blue lines show posterior medians, red line indicates the true effects.

Figure 5: Posterior distributions for the effect of sex at each node in the stan_lmer model that includes a parent effect. Pale bands show 50% credible intervals, dark blue lines show posterior medians, red line indicates the true effects.

model_results$edt_areas[[4]] + ggtitle("Effect posterior (diffusion tree)")
Figure 6: Posterior distributions for the effect of sex at each node in the effect diffusion model. Pale bands show 50% credible intervals, dark blue lines show posterior medians, red line indicates the true effects

Figure 6: Posterior distributions for the effect of sex at each node in the effect diffusion model. Pale bands show 50% credible intervals, dark blue lines show posterior medians, red line indicates the true effects

Looks like the diffusion tree is estimating substantially wider posterior distributions than the stan_lmer models.

Posterior Differences

Part of the motivation for the diffusion tree is the ability to estimate the difference between parent and child nodes directly. This allows us to detect regions where the effect is changing.

difference_tree <- Clone(model_results$tree[[4]])
node_names <- difference_tree$Get("name")
diff_posterior <- as.matrix(model_results$edt[[4]], "b")

difference_tree$Do(function(n){
  node_num <- match(n$name, node_names)
  n$b_diff <-
    diff_posterior[, paste0("b[", as.integer(node_num), ",2]") ]
})

difference_tree$Do(function(n){
  if(!isRoot(n))
    n$b_diff <- n$b_diff - n$parent$b_diff
}, traversal = "post-order")

Now each node has extra attributes b_post with the effect posterior and b_diff with the posterior difference between its effect and its parent. The posterior differences across the whole tree look like:

difference_matrix <-
  difference_tree$Get("b_diff", simplify = FALSE) %>%
  simplify2array

mcmc_areas(difference_matrix) + ggtitle("Effect change from parent")
Figure 7: Posterior distributions for the difference in sex effects between a given node and its parent. Pale bands show 50% credible intervals, dark blue lines show posterior medians.

Figure 7: Posterior distributions for the difference in sex effects between a given node and its parent. Pale bands show 50% credible intervals, dark blue lines show posterior medians.

Let’s pull out one leaf node and its entire ancestry to see how this looks.

parent_of <-
  function(node, name)
    name %in% node$Get("name")

left_diff_matrix <-
  difference_tree$Get("b_diff"
                    , pruneFun =
                        function(n)
                          parent_of(n, "left Lateral visual area")
                    , simplify = FALSE) %>%
  simplify2array

left_diff_matrix %>%
  mcmc_areas + ggtitle("Effect change from parent (left lateral visual)")
Figure 8: Posterior distributions for the difference in sex effects between a node and its parent for the left lateral visual area and its ancestor nodes. Pale bands show 50% credible intervals, dark blue lines show posterior medians.

Figure 8: Posterior distributions for the difference in sex effects between a node and its parent for the left lateral visual area and its ancestor nodes. Pale bands show 50% credible intervals, dark blue lines show posterior medians.

Here it looks like there is a change between the visual areas (the root node) and lateral visual areas (where we induced a hierarchical effect). The left lateral visual area changed little from its parent.

Now let’s examine the right side

right_diff_matrix <-
  difference_tree$Get("b_diff"
                    , pruneFun =
                        function(n)
                          parent_of(n, "right Lateral visual area")
                    , simplify = FALSE) %>%
  simplify2array

right_diff_matrix %>%
  mcmc_areas + ggtitle("Effect change from parent (right lateral visual)")
Figure 9: Posterior distributions for the difference in sex effects between a node and its parent for the right lateral visual area and its ancestor nodes. Pale bands show 50% credible intervals, dark blue lines show posterior medians.

Figure 9: Posterior distributions for the difference in sex effects between a node and its parent for the right lateral visual area and its ancestor nodes. Pale bands show 50% credible intervals, dark blue lines show posterior medians.

It looks like the right lateral visual area changes even more in the same direction as its parent.

Interpreting Posterior Differences

After examining the difference posteriors we can make descisions regarding where in the brain taxonomy an effect originates by checking if the credible interval for the difference bounds zero. This gives us two ways to examine the localization of effects in the brain, first using standard structure specific effect estimation, and second, between structure difference estimation to identify effect sources.

On Real Data

Let’s try the posterior difference strategy on some real data to see if we can recover known regions of sex effects. Here we’ll read in an external run of the full data set of 140 mice using the entire tree. The full tree has 336 leaf nodes with a maximum depth of 12. Four models were run on the real data:

  1. stan_lmer with y ~ sex + (sex | structure) + (1 | ID), referred to as the “flat” model, in code as lmer.
  2. stan_lmer with y ~ sex + (sex | structure) + (sex | parent) + (1 | ID), referred to as the “parental” model, in code as hlmer.
  3. stan_lmer with y ~ sex + (sex | structure) + (sex | parent) + (sex | grandparent) + (1 | ID), referred to as the “grand parental” model, in code as h2lmer.
  4. The effect diffusion model.
full_tree <- readRDS("full_tree.rds")
full_metadata <- read.csv("full_metadata.csv", stringsAsFactors = FALSE)
real_edt <- readRDS("real-data-edt.rds")
real_lmer <- readRDS("real-data-lmer.rds")
real_hlmer <- readRDS("real-data-hlmer.rds")
real_h2lmer <- readRDS("real-data-h2lmer.rds")

real_data <- tree_to_ept_data(full_tree, full_metadata, justLeaves = TRUE)
real_nodes <- full_tree$Get("name", filterFun = isLeaf)
real_sds <-
  real_lmer$data %>%
  group_by(name) %>%
  slice(1) %>%
  ungroup() %>%
  with(setNames(sd_vol, name)) %>%
  .[real_nodes]

We’ll use Michael Betancourt’s utility functions to check that all models fit successfully.

source("https://raw.githubusercontent.com/betanalpha/knitr_case_studies/master/rstan_workflow/stan_utility.R")

Flat model:

check_all_diagnostics(real_lmer$stanfit)
## [1] "n_eff / iter looks reasonable for all parameters"
## [1] "Rhat looks reasonable for all parameters"
## [1] "0 of 4000 iterations ended with a divergence (0%)"
## [1] "0 of 4000 iterations saturated the maximum tree depth of 10 (0%)"
## [1] "E-BFMI indicated no pathological behavior"

Parental model:

check_all_diagnostics(real_hlmer$stanfit)
## [1] "n_eff / iter looks reasonable for all parameters"
## [1] "Rhat looks reasonable for all parameters"
## [1] "0 of 4000 iterations ended with a divergence (0%)"
## [1] "0 of 4000 iterations saturated the maximum tree depth of 10 (0%)"
## [1] "E-BFMI indicated no pathological behavior"

Grand-parental model:

check_all_diagnostics(real_h2lmer$stanfit)
## [1] "n_eff / iter looks reasonable for all parameters"
## [1] "Rhat looks reasonable for all parameters"
## [1] "0 of 4000 iterations ended with a divergence (0%)"
## [1] "0 of 4000 iterations saturated the maximum tree depth of 10 (0%)"
## [1] "E-BFMI indicated no pathological behavior"

Effect diffusion tree:

check_all_diagnostics(real_edt)
## [1] "n_eff / iter looks reasonable for all parameters"
## [1] "Rhat for parameter L_omega[1,1] is NaN!"
## [1] "Rhat for parameter L_omega[1,2] is NaN!"
## [1] "Rhat for parameter L_sigma[1,2] is NaN!"
## [1] "  Rhat above 1.1 indicates that the chains very likely have not mixed"
## [1] "0 of 4000 iterations ended with a divergence (0%)"
## [1] "0 of 4000 iterations saturated the maximum tree depth of 10 (0%)"
## [1] "E-BFMI indicated no pathological behavior"

All diagnostics are passing. The NaN! warnings for the effect diffusion tree are from triangular matrices and are to be expected.

Compare PSIS LOO-CV

Now we’ll compare these models on expected log predictive density as above. These results will also be read from disk due to loo calls using very large amounts of memory.

real_lmer_loo <-
  real_lmer %>%
  log_lik(.) %>%
  { loo(.
      , r_eff =
          relative_eff(exp(.) ## dup . to first arg and exp
                     , chain_id = rep(1:4, each = 1000))) }

real_hlmer_loo <-
  real_hlmer %>%
  log_lik(.) %>%
  { loo(.
      , r_eff =
          relative_eff(exp(.) ## dup . to first arg and exp
                     , chain_id = rep(1:4, each = 1000))) }

real_h2lmer_loo <-
  real_h2lmer %>%
  log_lik(.) %>%
  { loo(.
      , r_eff =
          relative_eff(exp(.) ## dup . to first arg and exp
                     , chain_id = rep(1:4, each = 1000))) }
  
    
real_edt_loo <-
  real_edt %>%
  logLik_ept(., real_data$hept$y) %>%
  { loo(.
      , r_eff =
          relative_eff(exp(.) ## dup . to first arg and exp
                     , chain_id = rep(1:4, each = 1000))) }

real_lmer_res <-
  get_sglm_results(real_lmer, full_tree, real_sds, justLeaves = TRUE)

real_hlmer_res <-
  get_hsglm_results(real_hlmer, full_tree, real_sds)

real_h2lmer_res <-
  get_h2sglm_results(real_h2lmer, full_tree, real_sds)

real_edt_res <-
  get_ept_results(real_edt, full_tree, real_sds, justLeaves = TRUE)

save(real_lmer_loo = real_lmer_loo
   , real_hlmer_loo = real_hlmer_loo
   , real_h2lmer_loo = real_h2lmer_loo
   , real_edt_loo = real_edt_loo
   , real_lmer_res = real_lmer_res
   , real_hlmer_res = real_hlmer_res
   , real_h2lmer_res = real_h2lmer_res
   , real_edt_res = real_edt_res
   , file = "real-data-results.rda")
load("real-data-results.rda")

Flat model:

real_lmer_loo
## 
## Computed from 4000 by 47040 log-likelihood matrix
## 
##          Estimate    SE
## elpd_loo -59423.2 224.8
## p_loo       324.0   3.2
## looic    118846.4 449.7
## ------
## Monte Carlo SE of elpd_loo is 0.3.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.

Parental model:

real_hlmer_loo
## 
## Computed from 4000 by 47040 log-likelihood matrix
## 
##          Estimate    SE
## elpd_loo -59360.4 225.0
## p_loo       262.7   2.6
## looic    118720.8 450.0
## ------
## Monte Carlo SE of elpd_loo is 0.3.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.

Grand-parental model:

real_h2lmer_loo
## 
## Computed from 4000 by 47040 log-likelihood matrix
## 
##          Estimate    SE
## elpd_loo -59354.4 224.9
## p_loo       243.3   2.4
## looic    118708.9 449.9
## ------
## Monte Carlo SE of elpd_loo is 0.3.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.

Effect diffusion tree:

real_edt_loo
## 
## Computed from 4000 by 47040 log-likelihood matrix
## 
##          Estimate    SE
## elpd_loo -59358.0 225.0
## p_loo       257.5   2.6
## looic    118715.9 449.9
## ------
## Monte Carlo SE of elpd_loo is 0.3.
## 
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.

So the effect diffusion tree is performing marginally better than both stan_lmer models, with an edge in both ELPD and effect correlation against the flat model, and virtually identical to the parental model. The effect diffusion tree has a slightly worse ELPD than the grand-parental model. The effect diffusion tree also has more effective parameters than the grand-parental model.

Localizing Sex Effects

The ultimate goal of these models is to localize where in the brain there is evidence for an effect of sex. Since the number of nodes is large, we’ll subset down to the parameters where the 90% credible interval doesn’t overlap zero.

lmer_post_intervals <- posterior_interval(real_lmer_res$b_post)
lmer_nz_intervals <-
  lmer_post_intervals[sign(lmer_post_intervals[,1]) == sign(lmer_post_intervals[,2]),]

hlmer_post_intervals <- posterior_interval(real_hlmer_res$b_post)
hlmer_nz_intervals <-
  hlmer_post_intervals[sign(hlmer_post_intervals[,1]) == sign(hlmer_post_intervals[,2]),]

h2lmer_post_intervals <- posterior_interval(real_h2lmer_res$b_post)
h2lmer_nz_intervals <-
  h2lmer_post_intervals[sign(h2lmer_post_intervals[,1]) == sign(h2lmer_post_intervals[,2]),]

edt_post_intervals <- posterior_interval(real_edt_res$b_post)
edt_nz_intervals <-
  edt_post_intervals[sign(edt_post_intervals[,1]) == sign(edt_post_intervals[,2]),]

Let’s see how many structure are identified by each model

flat model:

nrow(lmer_nz_intervals)
## [1] 40

parental model:

nrow(hlmer_nz_intervals)
## [1] 108

grand-parental model:

nrow(h2lmer_nz_intervals)
## [1] 20

effect diffusion tree:

nrow(edt_nz_intervals)
## [1] 60

So the effect diffusion model is identifying 50% more structures as having non-zero effects of sex than the flat model, and the parental model recovers nearly twice as many structures as the effect diffusion tree. The grand-parental model identified the fewest structures, half as many as the flat model.

Let’s compare the list of structures for the three models, we’ll compute the jaccard similarity of each set of elements. The Jacard similarity is defined as the ratio of sizes of the intersection of two sets over their union: \(\frac{|A\cap B|}{|A\cup B|}\).

nz_interval_list <-
  list(lmer = lmer_nz_intervals
     , hlmer = hlmer_nz_intervals
     , h2lmer = h2lmer_nz_intervals
     , edt = edt_nz_intervals)

sapply(nz_interval_list
       , function(el1)
         sapply(nz_interval_list
              , function(el2) length(intersect(rownames(el1), rownames(el2))) /
                              length(union(rownames(el1), rownames(el2)))))
##             lmer     hlmer     h2lmer        edt
## lmer   1.0000000 0.3333333 0.13207547 0.36986301
## hlmer  0.3333333 1.0000000 0.18518519 0.42372881
## h2lmer 0.1320755 0.1851852 1.00000000 0.08108108
## edt    0.3698630 0.4237288 0.08108108 1.00000000

Interestingly the most similar pairs of sets of identified structures are the effect diffusion tree and the flat stan_lmer as well as the effect diffusion tree and stan_lmer with parent information included. The stan_lmer including grandparent information was most dissimilar from the rest and identified the fewest structures.

Examining known sexually dimorphic regions

The literature on mouse sex differences identifies three main structures that ought to be sexually dimorphic. These are the bed nuclei of the stria terminalis, the medial preoptic nucleus, and the medial amygdala (Qiu et al. 2018).

Let’s see how many of these each model recovers. If the model recovers all it should find 6 structures, which are these three structures in both left and right hemispheres.

find_known_dimorph_regions <-
  function(structures)
    grep("Bed nuclei|Medial preoptic|Medial amygdala"
       , structures
       , value = TRUE)

lmer_dimorph <- find_known_dimorph_regions(rownames(lmer_nz_intervals))
hlmer_dimorph <- find_known_dimorph_regions(rownames(hlmer_nz_intervals))
h2lmer_dimorph <- find_known_dimorph_regions(rownames(h2lmer_nz_intervals))
edt_dimorph <- find_known_dimorph_regions(rownames(edt_nz_intervals))

list(lmer = length(lmer_dimorph)
   , hlmer = length(hlmer_dimorph)
   , h2lmer = length(h2lmer_dimorph)
   , edt = length(edt_dimorph))
## $lmer
## [1] 4
## 
## $hlmer
## [1] 6
## 
## $h2lmer
## [1] 2
## 
## $edt
## [1] 4

So we see the stan_lmer with parent information recovers all 6, the flat stan_lmer and effect diffusion tree identify 4/6, and the stan_lmer with grand parent information recovers only two.

Viewing all identified structures

We’ll compare the identified structures for the parental model and the effect diffusion model, as those two identified the greatest number of structures.

mcmc_areas(real_hlmer_res$b_post[,rownames(hlmer_nz_intervals)])
Figure 10: Posterior distributions for the effect of sex at each leaf node in the parental `stan_lmer` model where the 90% credible does not bound zero. Pale bands show 50% credible intervals, dark blue lines show posterior medians.

Figure 10: Posterior distributions for the effect of sex at each leaf node in the parental stan_lmer model where the 90% credible does not bound zero. Pale bands show 50% credible intervals, dark blue lines show posterior medians.

mcmc_areas(real_edt_res$b_post[,rownames(edt_nz_intervals)])
Figure 11: Posterior distributions for the effect of sex at each leaf node in the effect diffusion model where the 90% credible interval does not bound zero. Pale bands show 50% credible intervals, dark blue lines show posterior medians.

Figure 11: Posterior distributions for the effect of sex at each leaf node in the effect diffusion model where the 90% credible interval does not bound zero. Pale bands show 50% credible intervals, dark blue lines show posterior medians.

As noted in the similarity analysis above, most of the structures identified by the effect diffusion tree are identified by the parental model, which many additional structures identified by the parental model.

Structure differences

Now let’s see which structures come up as having credible intervals of the differences not bounding zero.

real_difference_tree <- Clone(full_tree)
real_node_names <- real_difference_tree$Get("name")
real_diff_posterior <- as.matrix(real_edt, "b")

real_difference_tree$Do(function(n){
  node_num <- match(n$name, real_node_names)
  n$b_diff <-
    real_diff_posterior[, paste0("b[", node_num, ",2]") ]
})

real_difference_tree$Do(function(n){
  if(!isRoot(n))
    n$b_diff <- n$b_diff - n$parent$b_diff
}, traversal = "post-order")

real_post_diff <- real_difference_tree$Get("b_diff", simplify = FALSE) %>% simplify2array

real_post_diff_intervals <- posterior_interval(real_post_diff)
real_nz_diff_intervals <-
  real_post_diff_intervals[sign(real_post_diff_intervals[,1]) ==
                           sign(real_post_diff_intervals[,2]) , ]

mcmc_areas(real_post_diff[,rownames(real_nz_diff_intervals)])
Figure 12: Posterior distributions for the difference in sex effects between a node and its parent such that the 90% posterior interval does not bound zero. Pale bands show 50% credible intervals, dark blue lines show posterior medians.

Figure 12: Posterior distributions for the difference in sex effects between a node and its parent such that the 90% posterior interval does not bound zero. Pale bands show 50% credible intervals, dark blue lines show posterior medians.

Interesting to note, the striatum-like amygdalar nucleus and the medial amygdalar nucleus are detected as changing relative to their parents. The striatum-like amygdalar nucleus is the parent of the medial amygdalar nucleus. The medial amygdalar nucleus is one of the known sexually dimorphic regions noted above. This suggests that the sexual dimorphism observed in the medial amygdala is at least in part a consequence of an effect higher in the brain structure taxonomy. Also interesting to note is that the isocortex has a more negative effect relative to its parent, negative effects indicate regions where females have larger brain structures, and the isocortex is known to be diffusely larger in females (Qiu et al. 2018).

Concluding and Future Directions

In this notebook we’ve seen that for considering brain taxonomy the effect diffusion tree model provides comparable performance to stan_lmer with the additional advantage of providing estimates of the difference between a child structure and its parent. The initial findings are promising with the effect diffusion tree identifying 4/6 known sexually dimorphic structures, and recapitulating the finding of enlarged cortices in female mice dispersed across many structures. It is interesting that the parental model identified more structures than both the effect diffusion and grand-parental models, which may indicate that the primary advantage we’re getting from the brain taxonomy is pooling between left and right lateralizations of the same structure.

Considerable work remains to improve the effect diffusion model, particularly improving its computational performance via vectorizing the underlying stan code. Work also remains to validate the statistical performance of these models. More extensive simulation work is planned to identify under what conditions each model may be preferable and to get a sense for the performance of the models in terms of type-M and type-S error rates (Gelman and Carlin 2014).

In addition to the current set of models there many alternatives for addressing these questions. One tempting approach is to apply a spatial gaussian process prior over the model errors, although it is unclear how to combine this with the taxonomy tree to detect change points. Applying a gaussian process to structures by tree distance was unsuccessful in some earlier experiments.

Region based analyses are really just the begining. Our acquired images are tens of millions of individual observations (voxels, 3D pixels). Prematurely summarizing into regions probably discards considerable useful information about effect localization. Ideally we’d like to extend this approach to the voxel level, possibly by using an expectation propagation strategy.


Acknowledgements

We’d like to thank Darren Fernandes and Lily Qiu for helping interpret the patterns of detected structures. We’d also like to thank Zsu Buchwald and the two anonymous StanCon reviewers for reading an earlier version of this document and providing very constructive feedback.

References

Bowman, F DuBois, Brian Caffo, Susan Spear Bassett, and Clinton Kilts. 2008. “A Bayesian Hierarchical Framework for Spatial Modeling of FMRI Data.” NeuroImage 39 (1). Elsevier: 146–56.

Chakravarty, M Mallar, Patrick Steadman, Matthijs C Eede, Rebecca D Calcott, Victoria Gu, Philip Shaw, Armin Raznahan, D Louis Collins, and Jason P Lerch. 2013. “Performing Label-Fusion-Based Segmentation Using Multiple Automatically Generated Templates.” Human Brain Mapping 34 (10). Wiley Online Library: 2635–54.

Dorr, A E, J P Lerch, S Spring, N Kabani, and R M Henkelman. 2008. “High resolution three-dimensional brain atlas using an average magnetic resonance image of 40 adult C57Bl/6J mice.” NeuroImage 42 (1): 60–69.

Evans, Alan C, Andrew L Janke, D Louis Collins, and Sylvain Baillet. 2012. “Brain Templates and Atlases.” Neuroimage 62 (2). Elsevier: 911–22.

Friston, Karl J, Daniel E Glaser, Richard NA Henson, S Kiebel, Christophe Phillips, and John Ashburner. 2002. “Classical and Bayesian Inference in Neuroimaging: Applications.” Neuroimage 16 (2). Elsevier: 484–512.

Friston, Karl J, William Penny, Christophe Phillips, S Kiebel, G Hinton, and John Ashburner. 2002. “Classical and Bayesian Inference in Neuroimaging: Theory.” NeuroImage 16 (2). Elsevier: 465–83.

Gelman, Andrew, and John Carlin. 2014. “Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors.” Perspectives on Psychological Science 9 (6): 641–51. doi:10.1177/1745691614551642.

Lein, Ed S, Michael J Hawrylycz, Nancy Ao, Mikael Ayres, Amy Bensinger, Amy Bernard, Andrew F Boe, et al. 2007. “Genome-Wide Atlas of Gene Expression in the Adult Mouse Brain.” Nature 445 (7124). Nature Publishing Group: 168.

Penny, William D, Nelson J Trujillo-Barreto, and Karl J Friston. 2005. “Bayesian FMRI Time Series Analysis with Spatial Priors.” NeuroImage 24 (2). Elsevier: 350–62.

Qiu, Lily R, Darren J Fernandes, Kamila U Szulc-Lerch, Jun Dazai, Brian J Nieman, Daniel H Turnbull, Jane A Foster, Mark R Palmert, and Jason P Lerch. 2018. “Mouse MRI Shows Brain Areas Relatively Larger in Males Emerge Before Those Larger in Females.” Nature Communications 9 (1). Nature Publishing Group: 2615.

Richards, Kay, Charles Watson, Rachel F Buckley, Nyoman D Kurniawan, Zhengyi Yang, Marianne D Keller, Richard Beare, et al. 2011. “Segmentation of the mouse hippocampal formation in magnetic resonance images.” NeuroImage 58 (3): 732–40.

Sidén, Per, Anders Eklund, David Bolin, and Mattias Villani. 2017. “Fast Bayesian Whole-Brain FMRI Analysis with Spatial 3D Priors.” NeuroImage 146. Elsevier: 211–25.

Ullmann, Jeremy F P, Charles Watson, Andrew L Janke, Nyoman D Kurniawan, and David C Reutens. 2013. “A segmentation protocol and MRI atlas of the C57BL/6J mouse neocortex.” NeuroImage 78 (September): 196–203.

Woolrich, Mark W, Saad Jbabdi, Brian Patenaude, Michael Chappell, Salima Makni, Timothy Behrens, Christian Beckmann, Mark Jenkinson, and Stephen M Smith. 2009. “Bayesian Analysis of Neuroimaging Data in FSL.” Neuroimage 45 (1). Elsevier: S173–S186.